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Abstract. It is well known that Principal Component Analysis (PCA) 
is strongly affected by outliers and a lot of effort has been put into 
robustification of PCA. In this paper we present a new algorithm for 
robust PCA minimizing the trimmed reconstruction error. By directly 
minimizing over the Stiefel manifold, we avoid deflation as often used by 
projection pursuit methods. In distinction to other methods for robust 
PCA, our method has no free parameter and is computationally very 
efficient. We illustrate the performance on various datasets including 
an application to background modeling and subtraction. Our method 
performs better or similar to current state-of-the-art methods while being 
faster. 


1 Introduction 

PCA is probably the most common tool for exploratory data analysis, dimension¬ 
ality reduction and clustering, e.g., m- It can either be seen as finding the best 
low-dimensional subspace approximating the data or as finding the subspace of 
highest variance. However, due to the fact that the variance is not robust, PCA 
can be strongly influenced by outliers. Indeed, even one outlier can change the 
principal components (PCs) drastically. This phenomenon motivates the devel¬ 
opment of robust PCA methods which recover the PCs of the uncontaminated 
data. This problem received a lot of attention in the statistical community and 
recently became a problem of high interest in machine learning. 

In the statistical community, two main approaches to robust PCA have been 
proposed. The first one is based on the robust estimation of the covariance ma¬ 
trix, e.g., 0, ini. Indeed, having found a robust covariance matrix one can de¬ 
termine robust PCs by performing the eigenvalue decomposition of this matrix. 
However, it has been shown that robust covariance matrix estimators with de¬ 
sirable properties, such as positive semidefiniteness and affine equivariance, have 
a breakdown poinlQ upper bounded by the inverse of the dimensionality [5]. The 
second approach is the so called projection-pursuit where one maximizes 

a robust scale measure, instead of the standard deviation, over all possible direc¬ 
tions. Although, these methods have the best possible breakdown point of 0.5, 


^ The breakdown point [na of a statistical estimator is informally speaking the fraction 
of points which can be arbitrarily changed and the estimator is still well defined. 
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they lead to non-convex, typically, non-smooth problems and current state-of- 
the-art are greedy search algorithms [4], which show poor performance in high 
dimensions. Another disadvantage is that robust PCs are computed one by one 
using deflation techniques El, which often leads to poor results for higher PCs. 

In the machine learning and computer vision communities, matrix factor¬ 
ization approaches to robust PC A were mostly considered, where one looks 
for a decomposition of a data matrix into a low-rank part and a sparse part, 
e.g., 0, El, El, [22j. The sparse part is either assumed to be scattered uni¬ 
formly [3] or it is assumed to be row-wise sparse corresponding to the model 
where an entire observation is corrupted and discarded. While some of these 
methods have strong theoretical guarantees, in practice, they depend on a regu¬ 
larization parameter which is non-trivial to choose as robust PC A is an unsuper¬ 
vised problem and default choices, e.g., 0, El, often do not perform well as we 
discuss in Section]^ Furthermore, most of these methods are slow as they have 
to compute the SVD of a matrix of the size of the data matrix at each iteration. 

As we discuss in Section our formulation of robust PC A is based on the 
minimization of a robust version of the reconstruction error over the Stiefel man¬ 
ifold, which induces orthogonality of robust PCs. This formulation has multiple 
advantages. First, it has the maximal possible breakdown point of 0.5 and the 
interpretation of the objective is very simple and requires no parameter tuning 
in the default setting. In Sectionwe propose a new fast TRPCA algorithm for 
this optimization problem. Our algorithm computes both orthogonal PCs and 
a robust center, hence, avoiding the deflation procedure and preliminary robust 
centering of data. While our motivation is similar to the one of El, our opti¬ 
mization scheme is completely different. In particular, our formulation requires 
no additional parameter. 


2 Robust PCA 


Notation. All vectors are column vectors and Ip G denotes the identity 

matrix. We are given data X G with n observations in (rows correspond 

to data points). We assume that the data contains t true observations T G 
and n — t outliers O G such that X = T U O and T D O 7 ^ 0 . To be 

able to distinguish true data from outliers, we require the standard in robust 
statistics assumption, that is t The Stiefel manifold is denoted as Sk = 

[U G I U^U = 1 } (the set of orthonormal /c-frames in M^). 

PCA. Standard PCA El has two main interpretations. One can either see it 
as finding the /c-dimensional subspace of maximum variance in the data or the k- 
dimensional affine subspace with minimal reconstruction error. In this paper we 
are focusing on the second interpretation. Given data X G the goal is to 

find the offset m G and k principal components (r^i ,... ^Uk) = U G 5/c, which 
describe Al(m, U) = jzGM^ \ z = m P ^ ^ /c-dimensional 
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affine subspace, so that they minimize the reconstruction error 




1 X ^ 

argmin — \\zi — , 

meRP, UeSk, ZieAim,U) ^ 


^^\\2 • 


( 1 ) 


It is well known that m = ^ optimal matrix U G 5/^ is generated 

by the top k eigenvectors of the empirical covariance matrix. As U G 5/c is an 
orthogonal projection, an equivalent formulation of 0 is given by 

|m, f7|= argmin - W || (CC'''-/) (xj - m)ll^ . (2) 

f ^ meRp, ueSk ^ 


Robust PC A. When the data X does not contain outliers {X = T), we refer 
to the outcome of standard PC A, e.g., computed for the true data T as 
{RitiUt}- When there are some outliers in the data X, i.e. X = T U O, the 
result {m, I/} of PCA can be significantly different from {ttitiUt} computed 
for the true data T. The reason is the non-robust squared ^ 2 -norm involved in 
the formulation, e.g., 0 , m- It is well known that PCA has a breakdown point 
of zero, that is a single outlier can already distort the components arbitrarily. As 
outliers are frequently present in applications, robust versions of PCA are crucial 
for data analysis with the goal of recovering the true PCA solution {ttit^Ut} 
from the contaminated data X. 

As opposed to standard PCA, robust formulations of PCA based on the max¬ 
imization of the variance (the projection-pursuit approach as extension of 0 ), 
eigenvectors of the empirical covariance matrix (construction of a robust co- 
variance matrix), or the minimization of the reconstruction error (as extension 
of ([^) are not equivalent. Hence, there is no universal approach to robust PCA 
and the choice can depend on applications and assumptions on outliers. More¬ 
over, due to the inherited non-convexity of standard PCA, they lead to NP-hard 
problems. The known approaches for robust PCA either follow to some extent 
greedy/locally optimal optimization techniques, e.g., [4], [13], [19], [21], or com¬ 
pute convex relaxations, e.g., 0 , m, cs, m- 

In this paper we aim at a method for robust PCA based on the minimiza¬ 
tion of a robust version of the reconstruction error and adopt the classical out¬ 
lier model where entire observations (corresponding to rows in the data ma¬ 
trix X) correspond to outliers. In order to introduce the trimmed reconstruc¬ 
tion error estimator for robust PCA, we employ the analogy with the least 
trimmed squares estimator for robust regression. We denote by ri{m, U) = 
\\{UU--I) {xi — '^)||2 Ihe reconstruction error of observation Xi for the given 
affine subspace parameterized by (m, U). Then the trimmed reconstruction error 
is defined to be the sum of the t-smallest reconstruction errors ri(m, I/), 


1 ^ 

R{m,U) = - 


(3) 


i=l 


where r(i)(m, I/) < ••• < r(^)(m, I/) are in nondecreasing order and t, with 
[f] ^ should be a lower bound on the number of true examples T. If 
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such an estimate is not available as it is common in unsupervised learning, one 
can set by default t = With the latter choice it is straightforward to see 
that the corresponding PC A estimator has the maximum possible breakdown 
point of 0.5, that is up to 50% of the data points can be arbitrarily corrupted. 
With the default choice our method has no free parameter except the rank k. 

The minimization of the trimmed reconstruction error (§ leads then to a 
simple and intuitive formulation of robust PC A 

argmin R{m,U) = argmin - [/). (4) 

meM?’, ueSk meRp, ueSk ^ 

Note that the estimation of the subspace U and the center m is done jointly. This 
is in contrast to [22] , where the data has to be centered by 

a separate robust method which can lead to quite large errors in the estimation 
of the true PC A components. The same criterion Q has been proposed by m, 
see also [23] for a slightly different version. While both papers state that the 
direct minimization of © would be desirable, [15] solve a relaxation of @ 
into a convex problem while [23] smooth the problem and employ deterministic 
annealing. Both approaches introduce an additional regularization parameter 
controlling the number of outliers. It is non-trivial to choose this parameter. 

3 TRPCA: Minimizing Trimmed Reconstruction Error 
on the Stiefel Manifold 

In this section, we introduce TRPCA, our algorithm for the minimization of the 
trimmed reconstruction error Q. We first reformulate the objective of Q as it is 
neither convex, nor concave, nor smooth, even if m is fixed. While the resulting 
optimization problem is still non-convex, we propose an efficient optimization 
scheme on the Stiefel manifold with monotonically decreasing objective. Note 
that all proofs of this section can be found in the supplementary material m- 

3.1 Reformulation and First Properties 

The reformulation of @ is based on the following simple identity. Let Xi = Xi—m 
and U £ Sk, then 

nim, U) = II {UU^ - I) {xi - m )||2 = - + Pill' := nK U). (5) 

The equality holds only on the Stiefel manifold. Let r(i) (m, U) < ... < (m, t/), 

then we get the alternative formulation of 

I ^ 

argmin R{m,U) =-'^^ri{m,U). ( 6 ) 

meRp, ues t 

While ^ is still non-convex, we show in the next proposition that for fixed m 
the function R(rn^ U) is concave on This will allow us to employ a simple 

optimization technique based on linearization of this concave function. 
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Proposition 1 . For fixed m e MP the funetion R{m,U) : R defined 

in is eoneave in U. 

Proof We have ri(m, t/) = — ||t/^x ^||2 + ||^i|| 2 -As is convex, we deduce 

that ri{m^ U) is concave in U. The sum of the t smallest concave functions out of 
n > t concave functions is concave, as it can be seen as the pointwise minimum 
of all possible (^) sums of t of the concave functions, e.g., [ 2 ]. 

The iterative scheme uses a linearization of R{m^ U) in U. For that we need 
to characterize the superdifferential of the concave function R{m^ U). 

Proposition 2 . Let m be fixed. The superdifferential dR{m, U) of R{m, U) : 
RP^k ^ is given as 

n 

dR{m, t/) = I '^^ai{xi — m){xi — mYU | =t, 0 < < l|, (7) 

iei i=i 

where I = {i\ ri{m^ U) < U)} with U) < ... < U). 

Proof We reduce it to a well known case. We can write R{m,U) as 

n 

R{m^U) = min '^^airi{m^U), ( 8 ) 

0<a^<l, cxi=t i=l 

i=l 

that is a minimum of a parameterized set of concave functions. As the parameter 
set is compact and continuous (see Theorem 4.4.2 in [7]), we have 

n n 

9i?(m, [/) = conv^ ^ t/))^ = conv^ Ydri{m, U )], 

ajeRU) i=l aJeI(U)i=l 

(9) 

where I(U) = {a \ = R{m,U), YY=i^i = 0 < < 1, i = 

1 ,..., n} and conv(5') denotes the convex hull of S. Finally, using that ri{m^ U) 
is differentiable with dri{m^ U) = {{xi — m){xi — mYU} yields the result. 


3.2 Minimization Algorithm 

Algorithm for the minimization of ^ is based on block-coordinate descent in 
m and U. For the minimization in U we use that R{m,U) is concave for fixed 
m. Let G G dR{m,Uk)^ then by definition of the supergradient of a concave 
function, _ _ 

R (m, Rk^^) < R (m, Rk) + (G, - Rk) . (10) 

The minimization of the linear upper bound on the Stiefel manifold can be done 
in closed form, see Lemma below. For that we use a modified version of a 
result of [ 12 ] . Before giving the proof, we introduce the polar decomposition of a 
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matrix G G which is defined to be G = QP, where Q G 5 is an orthonormal 
matrix of size p x k and P is a symmetric positive semidefinite matrix of size 
k X k. We denote the factor Q of G by Polar(G). The polar can be computed in 
0{pk‘^) ioT p>k [ 12 ] as Polar (G) = UV^ (see Theorem 7.3.2. in [ 8 ]) using the 
SVD of G, G = UPV^. However, faster methods have been proposed, see [ 6 ], 
which do not even require the computation of the SVD. 

Lemma 1. Let G G , with k < p, and denote by (7i{G), i = 1,..., /c, the 
singular values of G. Then min^/e^fc (G, P) = minimizer 

P* = -Polar{G). IfG is of full rank, then Polar{G) = G(G^G)-i/^ 

Proof Let G = UPV^ be the SVD of G, that is P G 0(p), V G 0{k), where 
0{m) denotes the set of orthogonal matrices in 

k k 

min (G,0) = min {E,WOV) = min ^ - (H) 

(OGofc C'Gofc WGo/e —7 —“ 

1=1 1=1 

The lower bound is realized by —PV^ G Sk which is equal to — Polar{G). 
We have, — (PW^,PV^) = —trace(i7) = — The final statement 

follows from the proof of Theorem 7.3.2. in [ 8 ]. 


Algorithm 1 TRPCA 

Input: V, t, d, P° G <S, and mP median of A, tolerance e 
Output: robust center mf and robust PCs P^ 
repeat for /c = 1, 2 ,... 

Center data = Xi — mf, i — 1,.. ., n} 

Compute supergradient Q{U^) of R{mf, U^) for fixed mf 
Update P^+^ = - Polar 

Update = 1 where T^ are the indices of the t smallest 

riirrP , P^+^), z = 1,..., n 
until relative descent below e: 


Given that P is fixed, the center m can be updated simply as the mean of 
the points realizing the current objective of (§, that is the points realizing the 
Psmallest reconstruction error. Finally, although the objective of (l^is neither 
convex nor concave in m, we prove monotonic descent of Algor it hrn^ 

Theorem 1. TheJollowing holds for Algorithm^ At every iteration, either 
^(^/c+i, p/c+i) ^ or the algorithm terminates. 

Proof Let be fixed and G{U^) G dR{m, P^), then from ( [1Q| ) we have 

R{m\ U) < R{m, U^) - {G{U^), P^> + (G(P^), P) . ( 12 ) 
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The minimizer 7/^+^ = argmin [/), over the Stiefel manifold can be 

ueSk 

computed via Lemma[^as = — Polar{G{U^)). Thus we get immediately, 

After the update of we compute which are the indices of the t smallest 
ri{m^^ i = 1,..., n. If there are ties, then they are broken randomly. For 

fixed and fixed X^ the minimizer of the objective 

-||(f^*''^^)"^(a:i-m)||2 + ||a;i-m||2, (13) 

iex^' 

is given by = I which yields, Y < R{rn^ 

iex^' iex^' 

After the computation of X^ need no longer correspond to the t smallest 

reconstruction errors However, taking the t smallest ones only 

further reduces the objective, 7/^+^) < Yiex ^'This 

yields finally the result, < R{m^^ U^). 

The objective is non-smooth and neither convex nor concave. The Stiefel 
manifold is a non-convex constraint set. These facts make the formulation of 
critical points conditions challenging. Thus, while potentially stronger conver¬ 
gence results like convergence to a critical point are appealing, they are currently 
out of reach. However, as we will see in Sectionj^ Algorithm [^yields good empir¬ 
ical results, even beating state-of-the-art methods based on convex relaxations 
or other non-convex formulations. 

3.3 Complexity and Discussion 

The computational cost of each iteration of Algorithm]^ is dominated by 0{pk‘^) 
for computing the polar and 0{pkn) for a supergradient of R{m^ U) and, thus, 
has total cost 0{pk{k-\-n)). We compare this to the cost of the proximal method 
in [3], [20] for minimizing minx=A+£; 11^11*+"^ ll^lli - iteration, the dom¬ 

inating cost is 0(min{pn^, np^}) for the SVD of a matrix of size p x n. If the 
natural condition k ^ min{p, n} holds, we observe that the computational cost 
of TRPCA is significantly better. Thus even though we do 10 random restarts 
with different starting vectors, our TRPCA is still faster than all competing 
methods, which can also be seen from the runtimes in Table 

In [15], a relaxed version of the trimmed reconstruction error is minimized: 

min ||X-l„m’^-[/s-0||l + A||0|Li , (14) 

where ||0||2 i is added in order to enforce row-wise sparsity of O. The opti¬ 
mization is done via an alternating scheme. However, the disadvantage of this 
formulation is that it is difficult to adjust the number of outliers via the choice 
of A and thus requires multiple runs of the algorithm to find a suitable range, 
whereas in our formulation the number of outliers n — t can be directly controlled 
by the user or t can be set to the default value . 
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4 Experiments 

We compare our TRPCA (the code is available for download at [18]) algorithm 
with the following robust PC A methods: ORPCA [15], LLD[^ [TB]. HRPCA [21], 
standard PC A, and true PC A on the true data T (ground truth). For background 
subtraction, we also compare our algorithm with PCP [3] and RPC A [19], al¬ 
though the latter two algorithms are developed for a different outlier model. 

To get the best performance of LLD and ORPCA, we run both algorithms 
with different values of the regularization parameters to set the number of zero 
rows (observations) in the outlier matrix equal to i (which increases runtime 
significantly). The HRPCA algorithm has the same parameter t as our method. 

We write (0.5) in front of an algorithm name if the default value i = is 
used, otherwise, we use the ground truth information t = \T\. As performance 
measure we use the reconstruction error relative to the reconstruction error of 
the true data (which is achieved by PC A on the true data only): 

tre(C/, to) = i V ri{m,U)-ri{mT,UT), (15) 

L I ^ J 

where is the true PCA of T and it holds that tre{U^m) > 0. The 

smaller tre{U^m), i.e., the closer the estimates {m, U} to {ttit, Ut}^ the better. 
We choose datasets which are computationally feasible for all methods. 
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Fig. 1. First row left to right: 1) Datal, p = 100, ao = 2; 2) Datal, p = 20, ao = 2; 3) 
Data2, p = 100, ao = 0.35 ; Second row left to right: 1) Data2, p = 20, ao = 0.35; 2) 
USPSIO, /c = 1; 3) USPSIO, k = 10. 


^ Note, that the LLD algorithm m and the OPRPCA algorithm [22] are equivalent. 
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4.1 Synthetic Data Sets 

We sample uniformly at random a subspace of dimension k spanned hy U G Sk 
and generate the true data T G as T = AU^ + E where the entries of 

A G are sampled uniformly on [—1,1] and the noise E has Gaussian 

entries distributed as A/’(0,crT). We consider two types of outliers: (Datal) the 
outliers O G are uniform samples from [0,cro]^, (Data2) the outliers are 

samples from a random half-space, let w be sampled uniformly at random from 
the unit sphere and let x ^ A/'(0,crol) then an outlier Oi G is generated as 
Oi = X — max{(x, w) , 0}n;. For Data2, we also downscale true data by 0.5 factor. 
We always set n = t + o = 200, /c = 5, and ctt = 0.05 and construct data sets 
for different fractions of outliers A = G {0.1, 0.2, 0.3, 0.4, 0.45}. For every A 
we sample 5 data sets and report mean and standard deviation of the relative 
true reconstruction error tre(t/, m). 

4.2 Partially Synthetic Data Set 

We use USPS, a dataset of 16 x 16 images of handwritten digits. We use digits 
1 as true observations T and digits 0 as outliers O and mix them in different 
proportions. We refer to this data set as USPSIO and the results can be found in 
Fig-E Another similar experiment is on the MNIST data set of 28 x 28 images 
of handwritten digits. We use digits 1 (or 7) as true observations T and all 
other digits 0, 2, 3,..., 9 as outliers O (each taken in equal proportion). We mix 
true data and outliers in different proportions and the results can be found in 
Fig. (or Fig. [^, where we excluded LLD due to its low computational time, 
see Tab. We notice that TRPCA algorithm with the parameter value i = t 
(ground truth information) performs almost perfectly and outperforms all other 
methods, while the default version of TRPCA with parameter i = shows 
slightly worse performance. The fact that TRPCA estimates simultaneously the 
robust center m influences positively the overall performance of the algorithm, 
see, e.g., the experiments for background subtraction and modeling in Section |T^ 
and additional ones in the supplementary material. That is Fig. [6p7| 


4.3 Background Modeling and Subtraction 

In [19] and [3] robust PC A has been proposed as a method for background 
modeling and subtraction. While we are not claiming that robust PC A is the 
best method to do this, it is an interesting test for robust PCA. The data X 
are the image frames of a video sequence. The idea is that slight change in the 
background leads to a low-rank variation of the data whereas the foreground 
changes cannot be modeled by this and can be considered as outliers. Thus 
with the estimates m* and I/* of the robust PCA methods, the solution of the 
background subtraction and modeling problem is given as 

= m*-h t/*(t/*)~^(xi — m*) (16) 

where x^ is the background of frame i and its foreground is simply x{ = Xi—x\. 
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Fig. 2. Experiment on the MNIST data set with digits 1 as true observations T and 
all other digits 0, 2, 3,..., 9 as outliers. Number of recovered PCs is /c = 1 (left) and 
k = 5 (right). 
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Fig. 3. Experiment on the MNIST data set with digits 7 as true observations T and 
all other digits 0, 2, 3,..., 9 as outliers. Number of recovered PCs is A; = 1 (left) and 
k = 5 (right). 
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Fig. 4. Reconstruction errors, i.e., \\{xi — m*) — U* (xi — m*)|||, on the y-axis, 

for each frame on the x-axes for /c = 10. Note that the person is visible in the scene 
from frame 481 until the end. We consider the background images as true data and, 
thus, the reconstruction error should be high after frame 481 (when the person enters). 
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We experimentally compare the performance of all robust PC A methods on 
the water surface data set [1], which has moving water in its background. We 
choose this dataset of n = 633 frames each of size p = 128 x 160 = 20480 as it is 
computationally feasible for all the methods. In Fig.|^ we show the background 
subtraction results of several robust PC A algorithms. We optimized the value 
A for PCP of [3], [20] by hand to obtain a good decomposition, see the bottom 
right pictures of Fig. How crucial the choice of A is for this method can be 
seen from the bottom right pictures. Note that the reconstruction error of both 
the default version of TRPCA and TRPCA(0.5) with ground truth information 
provide almost perfect reconstruction errors with respect to the true data, cf.. 
Fig. in Hence, TRPCA is the only method which recovers the foreground and 
background without mistakes. We refer to the supplementary material for more 
explanations regarding this experiment as well as results for another background 
subtraction data set. The runtimes of all methods for the water surface data set 
are presented in Table which shows that TRPCA is the fastest of all methods. 


Table 1. Runtimes for the water surface data set for the algorithms described in 
Section]^ For TRPCA/TRPCA(0.5) we report the average time of one initialization 
(in practice, 5 — 10 random restarts are sufficient). For PCP we report the runtime for 
the employed parameter A = 0.001. For all others methods, it is the time of one full 
run of the algorithm including the search for regularization parameters. 
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3615 

875 

II 

114 

62 

4138 

3153 

67174 

90931 

- 

4230 

- 

/c = 9 

119 

92 

6371 

8508 

96954 

106782 

- 

4113 

- 


5 Conclusion 

We have presented a new method for robust PC A based on the trimmed recon¬ 
struction error. Our efficient algorithm, using fast descent on the Stiefel mani¬ 
fold, works in the default setting (t = without any free parameters and is 
significantly faster than other competing methods. In all experiments TRPCA 
performs better or at least similar to other robust PCA methods, in particular, 
TRPCA solves challenging background subtraction tasks. 
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Background i = 560, PCA Foreground i = 560, PCA 



Background i = 560, true PCA 
150 



Foreground i = 560, true PCA 



Background i = 560, TRPCA 
150r n 



Background i = 560, ORPCA 



Background i = 560, LLD 



Foreground i = 560, TRPCA 


Background i = 560, TRPCA(0.5) 
150r 


Foreground i = 560, TRPCA(0.5) 
150r 



Fig. 5. Backgrounds and foreground for frame i — 560 of the water surface data set. 
The last row corresponds to the PCP algorithm with values of A set by hand 
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6 Supplementary material: Experiments 

In this supplementary material we present additional illustrations of the back¬ 
ground subtraction experiments in Fig. 4-15. We consider the water surface data 
set and the moved object|^data set. For both data sets the frames where no per¬ 
son is present represent the true data T (background) and frames where the 
person is present are considered as outliers O. 




Frame 520 


Frame 540 


Frame 560 


Frame 580 



Fig. 6. Examples of the original frames of the water surface data set. Frames from 1 
to 481 contain only background (true data) with a moving water surface. The person 
(considered as outlier) enters the scene in frame 482 and is present up to the last frame 
633 


See http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.htm 
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Background, frame 560, PCA (on X) 



Background, frame 560, TRPCA(0.5) 



Background, frame 560, ORPCA 



Background, frame 560, LLD 




Foreground, frame 560, PCA (on X) 
i50 




Foreground, frame 560, ORPCA 

i50 



Foreground, frame 560, LLD 


Background, frame 560, True PCA (on T) 



Background, frame 560, TRPCA 



Background, frame 560, HRPCA 



Background, frame 560, RPCA 



Foreground, frame 560, True PCA (on T) 



50 

0 

-50 

50 

0 

-50 

Foreground, frame 560, RPCA 

50 

0 

-50 



Foreground, frame 560, HRPCA 



Foreground, frame 560, TRPCA 



Fig. 7. Background x\ and foreground x{ recovered with different methods, using (16), 
of frame 560 of the water surface data set, number of components /c = 10. These images 
correspond to the one of Fig. but the scaling has been changed for better visibility. 
Namely, all backgrounds/foreground images are rescaled so that the maximum and 
minimum pixel values are the same (please, note the numbers on the color bar); results 
for PCP can be found in Fig. 
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Background, frame 560, PCP (A,=0.0001) 



Background, frame 560, PCP (X=0.005) 



Foreground, frame 560, PCP (A,=0.0001) 





Foreground, frame 560, PCP (A,=0.001) 



-50 


Foreground, frame 560, PCP (A,=0.006) 
50 


0 


-50 





Fig. 8. Background and foreground x{ recovered, using (16), of frame 560 of the 
water surface data set with PCP using different regularization parameters. See similar 
results for other methods in previous Fig. 
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Frame 1 



Frame 500 



Frame 600 



Frame 637 



Frame 800 


200 

100 

0 




Frame 1389 Frame 1390 






200 

' j 





100 






0 



Frame 1391 



Frame 1400 




Frame 1501 



Frame 1502 



Frame 1600 



Frame 1700 



Fig. 9. Examples of the original frames of the moved object data set. Frames from 1 to 
637, from 892 to 1389, from 1503 to 1744 (end) contain only background (true data). 
The Person (outlier) is visible in the scene from frame 638 to 891 and from frame 1390 
to 1502. We refer to frames 0 to 892 in the following as the reduced moved object data 
set 
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Fig. 10. The reconstruction error of TPRCA/TRPCA(0.5), by analogy with Fig.[^ for 
the full moved object data set. The red vertical lines correspond to frames where the 
person enters/leaves the scene. We do not perform this experiment on the full datset 
for all other methods given their high runtimes (see Table 0 and instead proceed with 
the reduced dataset (see figures below). 

Please note also that there is a small change in the background between frames from 
1 to 637 (Bl) and frames from 892 to 1389 (B2). Thus the robust PCA components 
will capture this difference. This is not a problem for outlier detection (as we can 
see from the reconstruction errors of our method above) as this change is still small 
compared to the variation when the person enters the scene but it disturbs the fore¬ 
ground/background detection of all methods. An offline method could detect the scenes 
with small reconstruction error and do the background/foreground decomposition for 
each segment separately. The other option would be to use an online estimation proce¬ 
dure of robust components and center. We do not pursue these directions in this paper 
as the main purpose of these experiments is an illustration of the differences of the 
various robust PCA methods in the literature 
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Background, frame 651, PCA (on X) Foreground, frame 651, PCA (on X) Background, frame 651, True PCA (on T) Foreground, frame 651, True PCA (on T) 



Background, frame 651, TRPCA(0.5) Foreground, frame 651, TRPCA(0.5) 


Background, frame 651, TRPCA 


Foreground, frame 651, TRPCA 
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Background, frame 651, ORPCA 


Foreground, frame 651, ORPCA 


Background, frame 651, FIRPCA 


Foreground, frame 651, FIRPCA 



Fig. 11. Extracted background and foreground of frame 651 of the reduced moved 
object data set. The number of components is /c = 10 (scaled, compare to unsealed 


version in Fig. 12) 
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Foreground, frame 651, PCA (on X) 
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-50 
-100 
-150 


Background, frame 651, True PCA (on T) 



Foreground, frame 651, True PCA (on T) 
200 
100 
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-100 




Background, frame 651, TRPCA(0.5) 



Foreground, frame 651, TRPCA(0.5) 
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Background, frame 651, TRPCA 



Foreground, frame 651, TRPCA 
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Foreground, frame 651, FIRPCA 
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Background, frame 651, LLD 



Foreground, frame 651, LLD 
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Fig. 12. Extracted background and foreground of frame 651 of the reduced moved 
object data set. The number of components is /c = 10 (unsealed, compare to scaled 
version in Eig. 11) 
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Background, frame 681, PCA (on X) Foreground, frame 681, PCA (on X) Background, frame 681, True PCA (on T) Foreground, frame 681, True PCA (on T) 



Background, frame 681, TRPCA(0.5) Foreground, frame 681, TRPCA(0.5) 


Background, frame 681, TRPCA 


Foreground, frame 681, TRPCA 




Background, frame 681, FIRPCA 
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Foreground, frame 681, FIRPCA 



Fig. 13. Extracted background and foreground of frame 681 of the reduced moved 
object data set. The number of components is /c = 10 (scaled, compare to unsealed 


version in Fig. 14) 
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Background, frame 681, PCA (on X) 
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Foreground, frame 681, PCA (on X) Background, frame 681, True PCA (on T) Foreground, frame 681, True PCA (on T) 



Background, frame 681, TRPCA(0.5) 



Foreground, frame 681, TRPCA(0.5) 
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Foreground, frame 681, TRPCA 



Background, frame 681, ORPCA 




Foreground, frame 681, FIRPCA 

100 
50 
0 

-50 
-100 



Background, frame 681, LLD 



Foreground, frame 681, LLD 

100 
50 
0 

-50 
-100 


Background, frame 681, RPCA 



Foreground, frame 681, RPCA 




Fig. 14. Extracted background and foreground of frame 681 of the reduced moved 
object data set. The number of components is /c = 10 (unsealed, compare to scaled 
version in Eig. 13) 
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Background, frame 651, PCP (A,=0.001) 


Background, frame 681, PCP (X=0.0005) 
150 


Foreground, frame 651, PCP (A,=0.001) 



Foreground, frame 681, PCP (A,=0.0005) 




Fig. 15. Extracted background and foreground of frames 651 and 681 of the reduced 
moved object data set obtained with PCP (scaled, compare to unsealed version in 

Fig.p^ 
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Background, frame 651, PCP (A,=0.0005) 



Foreground, frame 651, PCP (A,=0.0005) 



Background, frame 651, PCP (A,=0.001) 




Background, frame 651, PCP (X=0.005) 




Background, frame 681, PCP (X=0.001) 




Foreground, frame 681, PCP (A,=0.005) 



Fig. 16. Extracted background and foreground of frames 651 and 681 of the reduced 
moved object data set obtained with PCP (unsealed, compare to scaled version in 
Fig.fW) 
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Fig. 17. Reconstruction errors of different methods on the reduced moved object data 
set (analogous to Fig. [^. One can see that TRPCA/TRPCA(0.5) again recovers the 
reconstruction errors of the true data almost perfectly as opposed to all other methods. 
However, note that RPC A does also well in having large reconstruction error for all 
frames containing the person 











